Fix #933: raise compaction memory_limit to 4 GB#952
Merged
Conversation
#942 lowered the cap to 1 GB on the theory that a tight memory_limit plus temp_directory would force DuckDB to spill earlier and keep peak working set down. That validation ran against query_stats (narrow, ~1.7M rows) and showed peak 1236 MB → 166 MB. The reporter's actual failure is on query_snapshots, which carries query_text + query_plan + live_query_plan per row. With the 1 GB cap, the nightly logs show OOM at "906/953 MiB used" before any merge progress. The standalone reproducer (tools/CompactionRepro) confirms the cause: parquet COPY in DuckDB v1.5.2 makes allocations that bypass the buffer manager and can't be spilled. The cap acts as a hard ceiling for those, not a spill trigger. Spill on disk = 0 MB across every configuration we tested (memory_limit 1/2/4 GB, accumulator vs tournament merge, threads 1 vs 2, :memory: vs file-backed DB). The same failure reproduces in standalone DuckDB CLI v1.5.2, so it's an engine issue — see upstream issues duckdb#16482 and discussion#10084. DuckDB's own OOM guide explicitly warns about this case and recommends memory_limit at 50-60% of system RAM, not a tight cap. 4 GB sits well inside that range for typical workstation/server hosts and leaves real headroom on top of the un-spillable allocations. Reporter's actual file sizes (15-25 chunks of 2-6 MB plus a 35-45 MB monthly file per group) are well below the level where 4 GB has any trouble. The reproducer confirms 4 GB succeeds on a synthetic query_snapshots-shaped dataset of ~1.5 GB with peak working set of ~400 MB; the reporter's data is ~143 MB at worst. Also updates the stale comment about spilling — temp_directory was set per #935 but the buffer-manager-bypassing allocations don't use it. The comment now describes what actually happens. The tools/CompactionRepro changes add --strategy {accumulator|tournament}, --db-mode {memory|file}, --merge-files, --synthetic data generation, and --cycles for leak testing. These are kept so a future regression in this area can be reproduced and diagnosed quickly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
pull Bot
pushed a commit
to ehtick/PerformanceMonitor
that referenced
this pull request
May 20, 2026
…nsiently for COPY erikdarlingdata#933's titled complaint is "Memory usage on client": Lite holds ~2.7-2.9 GB after 10 minutes with 4 servers. The compaction OOMs everyone has been chasing in this thread are a downstream symptom — by the time compaction runs the app already holds 2.7 GB, leaving little headroom on the reporter's 16 GB / ~1.6 GB-free machine. Root cause: the main DuckDB ConnectionString set no memory_limit, so the buffer pool ran at the DuckDB default of 80% of system RAM (~12.8 GB on a 16 GB box). With archive parquet files accumulating on disk, every UI query over an archive view caches pages and the buffer pool grows freely. The fix has to navigate one wrinkle: parquet COPY in DuckDB v1.5.2 hits a buffer-manager-bypass pre-reservation that needs ~2-4 GB headroom. Capping the main connection at 1 GB statically would break ExportToParquet and the two COPY paths in ArchiveAllAndResetAsync. So: - ConnectionString: memory_limit=1GB (caps resting buffer pool — addresses the actual complaint by stopping the archive-page cache from growing unbounded). - Around each parquet COPY on the main connection: SET memory_limit='4GB', run the COPY, SET back to '1GB'. Factored into a WithRaisedCopyMemoryLimit helper so the three call sites stay consistent (ExportToParquet, and the two COPYs in ArchiveAllAndResetAsync). - Compaction connections (separate :memory: instances) keep their 4 GB cap from erikdarlingdata#952. Verified against DuckDB CLI v1.5.2 with synthetic query_snapshots-shaped data: - COPY table→parquet at 256MB/512MB/1GB: OOMs (pre-reservation, matches the read_parquet→parquet path we saw in erikdarlingdata#952 testing). - COPY table→parquet at 2GB/4GB: succeeds, peak RSS well under cap. - INSERT (Appender) and SELECT (including GROUP BY across 11k rows) work fine at 256MB cap — confirms collectors and UI queries don't have the pre-reservation behavior and aren't affected by the resting cap. Tradeoff: the resting cap forces buffer-pool eviction of cached archive parquet pages. Long-range historical UI queries that re-scan many parquet files will do more disk I/O. Live/recent-data queries against the hot DB are unaffected (hot DB is small enough to fit in 1 GB easily). Plus the per-merge-step BuildSelectClause from the previous commit fixes the separate query_store_stats Binder Error on archives that span the v13 schema change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
memory_limitfor parquet compaction from1GB→4GBinLite/Services/ArchiveService.cs(both single-pass and pair-merge code paths)tools/CompactionReprowith stress-test scaffolding for future regressionsWhy
#942 dropped the cap to 1 GB on the theory that a tighter limit +
temp_directorywould force earlier spilling and keep peak working set down. That validation ran onquery_stats(narrow, ~1.7M rows). The reporter's failure is onquery_snapshots, which carriesquery_text+query_plan+live_query_planper row. Nightly logs (#933) show OOM at906/953 MiB usedwith the 1 GB cap.Standalone reproducer pinpoints the cause: parquet
COPYin DuckDB v1.5.2 makes allocations that bypass the buffer manager and can't be spilled. The cap acts as a hard ceiling for those, not a spill trigger. Spill on disk = 0 MB across every configuration tested (memory_limit 1/2/4 GB; accumulator vs tournament merge; threads 1 vs 2;:memory:vs file-backed DB). Same failure reproduces in standalone DuckDB CLI v1.5.2 — engine issue, not a binding artifact. Upstream: duckdb#16482, duckdb#10084.DuckDB's OOM guide explicitly warns about this and recommends
memory_limitat 50-60% of system RAM. 4 GB sits well inside that range and leaves real headroom on top of the un-spillable allocations.Reporter's actual file sizes (provided in #933): 15-25 chunks of 2-6 MB plus a 35-45 MB monthly file per group, totaling ~97-143 MB per merge. Comfortably below where 4 GB has any trouble.
Test plan
dotnet build Lite/PerformanceMonitorLite.csproj -c Release— cleantools/CompactionReprosucceeds at 4 GB on synthetic data shaped likequery_snapshots:Notes
tools/CompactionReprogained--strategy {accumulator|tournament},--db-mode {memory|file},--merge-files,--synthetic(with--synthetic-rowsand--synthetic-plan-kb), and--cyclesfor leak testing. Kept so a future regression in this area is easy to reproduce.🤖 Generated with Claude Code